Goto

Collaborating Authors

 mode choice


Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning

Zhao, Yibo, Zhao, Yang, Du, Hongru, Yang, Hao Frank

arXiv.org Artificial Intelligence

Decision-making models for individuals, particularly in high-stakes scenarios like vaccine uptake, often diverge from population optimal predictions. This gap arises from the uniqueness of the individual decision-making process, shaped by numerical attributes (e.g., cost, time) and linguistic influences (e.g., personal preferences and constraints). Developing upon Utility Theory and leveraging the textual-reasoning capabilities of Large Language Models (LLMs), this paper proposes an Adaptive Textual-symbolic Human-centric Reasoning framework (ATHENA) to address the optimal information integration. ATHENA uniquely integrates two stages: First, it discovers robust, group-level symbolic utility functions via LLM-augmented symbolic discovery; Second, it implements individual-level semantic adaptation, creating personalized semantic templates guided by the optimal utility to model personalized choices. Validated on real-world travel mode and vaccine choice tasks, ATHENA consistently outperforms utility-based, machine learning, and other LLM-based models, lifting F1 score by at least 6.5% over the strongest cutting-edge models. Further, ablation studies confirm that both stages of ATHENA are critical and complementary, as removing either clearly degrades overall predictive performance. By organically integrating symbolic utility modeling and semantic adaptation, ATHENA provides a new scheme for modeling human-centric decisions. The project page can be found at https://yibozh.github.io/Athena.


Towards Locally Deployable Fine-Tuned Causal Large Language Models for Mode Choice Behaviour

Alsaleh, Tareq, Farooq, Bilal

arXiv.org Artificial Intelligence

This study investigates the adoption of open-access, locally deployable causal large language models (LLMs) for travel mode choice prediction and introduces LiTransMC, the first fine-tuned causal LLM developed for this task. We systematically benchmark eleven open-access LLMs (1-12B parameters) across three stated and revealed preference datasets, testing 396 configurations and generating over 79,000 mode choice decisions. Beyond predictive accuracy, we evaluate models generated reasoning using BERTopic for topic modelling and a novel Explanation Strength Index, providing the first structured analysis of how LLMs articulate decision factors in alignment with behavioural theory. LiTransMC, fine-tuned using parameter efficient and loss masking strategy, achieved a weighted F1 score of 0.6845 and a Jensen-Shannon Divergence of 0.000245, surpassing both untuned local models and larger proprietary systems, including GPT-4o with advanced persona inference and embedding-based loading, while also outperforming classical mode choice methods such as discrete choice models and machine learning classifiers for the same dataset. This dual improvement, i.e., high instant-level accuracy and near-perfect distributional calibration, demonstrates the feasibility of creating specialist, locally deployable LLMs that integrate prediction and interpretability. Through combining structured behavioural prediction with natural language reasoning, this work unlocks the potential for conversational, multi-task transport models capable of supporting agent-based simulations, policy testing, and behavioural insight generation. These findings establish a pathway for transforming general purpose LLMs into specialized and explainable tools for transportation research and policy formulation, while maintaining privacy, reducing cost, and broadening access through local deployment.


MICROTRIPS: MICRO-geography TRavel Intelligence and Pattern Synthesis

Wang, Yangyang, Fabusuyi, Tayo

arXiv.org Artificial Intelligence

This study presents a novel small-area estimation framework to enhance urban transportation planning through detailed characterization of travel behavior. Our approach improves on the four-step travel model by employing publicly available microdata files and machine learning methods to predict travel behavior for a representative, synthetic population at small geographic areas. This approach enables high-resolution estimation of trip generation, trip distribution, mode choice, and route assignment. Validation using ACS/PUMS work-commute datasets demonstrates that our framework achieves higher accuracy compared to conventional approaches. The resulting granular insights enable the tailoring of interventions to address localized situations and support a range of policy applications and targeted interventions, including the optimal placement of micro-fulfillment centers, effective curb-space management, and the design of more inclusive transportation solutions particularly for vulnerable communities.


NestGNN: A Graph Neural Network Framework Generalizing the Nested Logit Model for Travel Mode Choice

Zhou, Yuqi, Cheng, Zhanhong, Hu, Lingqian, Bu, Yuheng, Wang, Shenhao

arXiv.org Machine Learning

Nested logit (NL) has been commonly used for discrete choice analysis, including a wide range of applications such as travel mode choice, automobile ownership, or location decisions. However, the classical NL models are restricted by their limited representation capability and handcrafted utility specification. While researchers introduced deep neural networks (DNNs) to tackle such challenges, the existing DNNs cannot explicitly capture inter-alternative correlations in the discrete choice context. To address the challenges, this study proposes a novel concept - alternative graph - to represent the relationships among travel mode alternatives. Using a nested alternative graph, this study further designs a nested-utility graph neural network (NestGNN) as a generalization of the classical NL model in the neural network family. Theoretically, NestGNNs generalize the classical NL models and existing DNNs in terms of model representation, while retaining the crucial two-layer substitution patterns of the NL models: proportional substitution within a nest but non-proportional substitution beyond a nest. Empirically, we find that the NestGNNs significantly outperform the benchmark models, particularly the corresponding NL models by 9.2\%. As shown by elasticity tables and substitution visualization, NestGNNs retain the two-layer substitution patterns as the NL model, and yet presents more flexibility in its model design space. Overall, our study demonstrates the power of NestGNN in prediction, interpretation, and its flexibility of generalizing the classical NL model for analyzing travel mode choice.


Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction

Xu, Yiming, Jiao, Junfeng

arXiv.org Artificial Intelligence

Accurately predicting travel mode choice is essential for effective transportation planning, yet traditional statistical and machine learning models are constrained by rigid assumptions, limited contextual reasoning, and reduced generalizability. This study explores the potential of Large Language Models (LLMs) as a more flexible and context-aware approach to travel mode choice prediction, enhanced by Retrieval-Augmented Generation (RAG) to ground predictions in empirical data. We develop a modular framework for integrating RAG into LLM-based travel mode choice prediction and evaluate four retrieval strategies: basic RAG, RAG with balanced retrieval, RAG with a cross-encoder for re-ranking, and RAG with balanced retrieval and cross-encoder for re-ranking. These strategies are tested across three LLM architectures (OpenAI GPT-4o, o4-mini, and o3) to examine the interaction between model reasoning capabilities and retrieval methods. Using the 2023 Puget Sound Regional Household Travel Survey data, we conduct a series of experiments to evaluate model performance. The results demonstrate that RAG substantially enhances predictive accuracy across a range of models. Notably, the GPT-4o model combined with balanced retrieval and cross-encoder re-ranking achieves the highest accuracy of 80.8%, exceeding that of conventional statistical and machine learning baselines. Furthermore, LLM-based models exhibit superior generalization abilities relative to these baselines. Findings highlight the critical interplay between LLM reasoning capabilities and retrieval strategies, demonstrating the importance of aligning retrieval strategies with model capabilities to maximize the potential of LLM-based travel behavior modeling.


Modeling Urban Transport Choices: Incorporating Sociocultural Aspects

Salazar-Serna, Kathleen, Cadavid, Lorena, Franco, Carlos J.

arXiv.org Artificial Intelligence

By understanding how users decide on their commuting modes, it is possible to identify factors that can be influenced to change travel behavior and promote the adoption of more sustainable transportation modes. Agent-based modeling (ABM) is particularly valuable for this purpose, as it can represent complex systems like transportation and identify emerging collective behaviors resulting from the autonomous decisions of transport users interacting among them and with the environment (Kagho, Balac, and Axhausen 2020). These capabilities make ABM suitable for analyzing the impacts of transport policies (Wise, Crooks, and Batty 2017). However, the application of ABM in analyzing transport mode choices has been limited and studies have been conducted predominantly in developed countries (Cadavid and Salazar-Serna 2021; Salazar-Serna, Cadavid, Franco, and Carley 2023). The effectiveness of these findings may not extend seamlessly to developing regions due to different contextual policy needs and the distinct ways socioeconomic and cultural factors influence human behavior (Carley 1991; Salazar-Serna et al. 2023). Therefore, policies that have been successful in one setting might not achieve similar outcomes in another. Previous studies in transportation have identified various determinants affecting mode choice. These factors can be grouped into several categories: sociodemographic characteristics such as age, sex, occupation, and income level (Ashalatha et al. 2013); travel habits including distance traveled, travel time, origin-destination pairs, and trip purpose (Madhuwanthi et al. 2016); and attributes of the built environment like design, density, and capacity (Ewing and Cervero 2010). Additionally, attitudes and perceptions regarding transport modes, which cover aspects such as comfort, cost, security, safety, quality, and reliability, play a crucial role (Fu 2021).


Combining data from multiple sources for urban travel mode choice modelling

Grzenda, Maciej, Luckner, Marcin, Zawieska, Jakub, Wrona, Przemysław

arXiv.org Artificial Intelligence

Demand for sustainable mobility is particularly high in urban areas. Hence, there is a growing need to predict when people will decide to use different travel modes with an emphasis on environmentally friendly travel modes. As travel mode choice (TMC) is influenced by multiple factors, in a growing number of cases machine learning methods are used to predict travel mode choices given respondent and journey features. Typically, travel diaries are used to provide core relevant data. However, other features such as attributes of mode alternatives including, but not limited to travel times, and, in the case of public transport (PT), also walking distances have a major impact on whether a person decides to use a travel mode of interest. Hence, in this work, we propose an architecture of a software platform performing the data fusion combining data documenting journeys with the features calculated to summarise transport options available for these journeys, built environment and environmental factors such as weather conditions possibly influencing travel mode decisions. Furthermore, we propose various novel features, many of which we show to be among the most important for TMC prediction. We propose how stream processing engines and other Big Data systems can be used for their calculation. The data processed by the platform is used to develop machine learning models predicting travel mode choices. To validate the platform, we propose ablation studies investigating the importance of individual feature subsets calculated by it and their impact on the TMC models built with them. In our experiments, we combine survey data, GPS traces, weather and pollution time series, transport model data, and spatial data of the built environment. The growth in the accuracy of TMC models built with the additional features is up to 18.2% compared to the use of core survey data only.


Analyzing Transport Policies in Developing Countries with ABM

Salazar-Serna, Kathleen, Cadavid, Lorena, Franco, Carlos

arXiv.org Artificial Intelligence

Deciphering travel behavior and mode choices is a critical aspect of effective urban transportation system management, particularly in developing countries where unique socio-economic and cultural conditions complicate decision-making. Agent-based simulations offer a valuable tool for modeling transportation systems, enabling a nuanced understanding and policy impact evaluation. This work aims to shed light on the effects of transport policies and analyzes travel behavior by simulating agents making mode choices for their daily commutes. Agents gather information from the environment and their social network to assess the optimal transport option based on personal satisfaction criteria. Our findings, stemming from simulating a free-fare policy for public transit in a developing-country city, reveal a significant influence on decision-making, fostering public service use while positively influencing pollution levels, accident rates, and travel speed.


Improving the accuracy of freight mode choice models: A case study using the 2017 CFS PUF data set and ensemble learning techniques

Liu, Diyi, Lim, Hyeonsup, Uddin, Majbah, Liu, Yuandong, Han, Lee D., Hwang, Ho-ling, Chin, Shih-Miao

arXiv.org Artificial Intelligence

The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In this study, we used the 2017 Commodity Flow Survey Public Use File data set to explore building a high-performance freight mode choice model, considering three main improvements: (1) constructing local models for each separate commodity/industry category; (2) extracting useful geographical features, particularly the derived distance of each freight mode between origin/destination zones; and (3) applying additional ensemble learning methods such as stacking or voting to combine results from local and unified models for improved performance. The proposed method achieved over 92% accuracy without incorporating external information, an over 19% increase compared to directly fitting Random Forests models over 10,000 samples. Furthermore, SHAP (Shapely Additive Explanations) values were computed to explain the outputs and major patterns obtained from the proposed model. The model framework could enhance the performance and interpretability of existing freight mode choice models.


Modeling Freight Mode Choice Using Machine Learning Classifiers: A Comparative Study Using the Commodity Flow Survey (CFS) Data

Uddin, Majbah, Anowar, Sabreena, Eluru, Naveen

arXiv.org Artificial Intelligence

This study explores the usefulness of machine learning classifiers for modeling freight mode choice. We investigate eight commonly used machine learning classifiers, namely Naive Bayes, Support Vector Machine, Artificial Neural Network, K-Nearest Neighbors, Classification and Regression Tree, Random Forest, Boosting and Bagging, along with the classical Multinomial Logit model. US 2012 Commodity Flow Survey data are used as the primary data source; we augment it with spatial attributes from secondary data sources. The performance of the classifiers is compared based on prediction accuracy results. The current research also examines the role of sample size and training-testing data split ratios on the predictive ability of the various approaches. In addition, the importance of variables is estimated to determine how the variables influence freight mode choice. The results show that the tree-based ensemble classifiers perform the best. Specifically, Random Forest produces the most accurate predictions, closely followed by Boosting and Bagging. With regard to variable importance, shipment characteristics, such as shipment distance, industry classification of the shipper and shipment size, are the most significant factors for freight mode choice decisions.